Skip to content

feat(install): provision prereqs + fail-fast robustness (CUDA, firewall, Git Bash guard, node/npm checks)#1612

Open
joelteply wants to merge 3 commits into
canaryfrom
feat/install-provisions-prereqs
Open

feat(install): provision prereqs + fail-fast robustness (CUDA, firewall, Git Bash guard, node/npm checks)#1612
joelteply wants to merge 3 commits into
canaryfrom
feat/install-provisions-prereqs

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Why

Modern-app install behavior: if a prerequisite is detected as necessary and it's reasonable to do so, the install should provide it — idempotently, on every run/update — not merely warn.

Two regressions/gaps this closes:

1. CUDA toolkit (install.sh)

detect_gpu only warned when nvcc was missing, with a wrong comment ("needed for training, not inference"). nvcc is required to build --features cuda (candle-kernels/cudarc compile GPU kernels at build time) — so a CUDA box silently fell back to CPU.

New install_cuda_toolkit():

  • Runs when a CUDA GPU is detected; installs the toolkit from NVIDIA's apt repo.
  • wsl-ubuntu repo under WSL2 (the GPU driver comes from the Windows host — never a Linux driver); ubuntu2404 native; sbsa for aarch64.
  • Targets 12.9 (the Blackwell sm_120 / RTX 5090 floor is 12.8; 12.9 is the newest 12.x with broad cudarc/candle support).
  • Idempotent: re-run skips when a recent-enough toolkit is present.

2. airc inbound firewall (windows-setup-autostart.ps1)

Cross-machine grid routing needs other nodes to reach this box's airc LAN listener (an ephemeral TCP port). Previously this required a manual one-off New-NetFirewallRule paste — the exact thing blocking the keystone's inbound leg. Now the (already-elevated) Windows setup script adds an idempotent inbound allow-rule for airc.exe by program path.

Validation

  • bash -n install.sh clean; PowerShell parser clean on the ps1.
  • Both changes are no-ops on non-CUDA / non-airc boxes and safe to re-run.
  • Infra-script-only change (shell + PowerShell, no TS/Rust) — committed with the precommit config's documented ENABLE_TYPESCRIPT_CHECK=false ENABLE_BROWSER_TEST=false knobs (the inapplicable phases), not --no-verify.

🤖 Generated with Claude Code

…requisites

Modern-app install behavior: detect a needed prerequisite and provide it,
idempotently, on every run/update — instead of merely warning.

install.sh: install_cuda_toolkit() — when a CUDA GPU is detected and nvcc is
missing or below the Blackwell floor (12.8, sm_120 / RTX 5090), install the
CUDA toolkit from NVIDIAs apt repo (wsl-ubuntu repo under WSL2; the GPU driver
comes from the Windows host, never a Linux driver). Idempotent: a re-run skips
when a recent-enough toolkit exists. Replaces the warn-only detect_gpu path,
whose comment wrongly called nvcc training-only — its required to build
--features cuda (candle-kernels/cudarc compile GPU kernels at build time).

windows-setup-autostart.ps1: idempotent inbound firewall allow-rule for
airc.exe so other grid nodes can reach this boxs airc LAN listener (ephemeral
port) — the missing piece that forced a manual netsh paste to close the
cross-machine keystones inbound leg.

Skipped TS-compile + browser precommit phases via the configs documented env
knobs (ENABLE_TYPESCRIPT_CHECK=false ENABLE_BROWSER_TEST=false) — change is
shell + PowerShell infra only, no TS/Rust.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e success

Running tools/scripts/install.sh from Git Bash/MSYS (uname MINGW64_NT) fell
through preflight_detect_platform Darwin/Linux cases to "unknown", then
plowed into apt-based Node/npm steps that cant work there — a cascade of
"command not found" AND a false "node installed" (the echo ran node --version
unconditionally; the npm step | tail masked the missing-binary exit).

- preflight_detect_platform: detect MINGW*/MSYS*/CYGWIN* -> windows-shell.
- install.sh: fail fast on windows-shell/unknown with a clear redirect to WSL2
  (or install.ps1) BEFORE any package step runs.
- install_node: verify node is on PATH after install; error if not.
- npm build step: preflight command -v npm; check npms real exit via
  PIPESTATUS[0] (not tails) and stop on failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@joelteply joelteply changed the title feat(install): provision CUDA toolkit + airc firewall as detected prerequisites feat(install): provision prereqs + fail-fast robustness (CUDA, firewall, Git Bash guard, node/npm checks) Jun 12, 2026
… + pin toolchain

The from-source dev script (tools/scripts/install.sh) billed itself as THE
installer, so users (and agents) ran it — often in Git Bash where it cant
work — instead of the real one-command path. Rewrite its header to redirect
most users to install.ps1 (Windows) / curl install.sh (Linux/macOS), which
handle WSL2 + Docker + GPU in one shot. Pair with the windows-shell fail-fast
guard so the dev script is unmistakable.

Also add rust-toolchain.toml pinning 1.92.0 so the from-source build is
reproducible (continuum-core ICEs rustc >=1.93 on x86_64-linux).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Adversarial review — PR #1612

Two-day-old PR, reviewed against canary, with cross-PR context from #1608. Findings below; CI failure analysis at the bottom.

BLOCK — conflicts with PR #1608 (delete autostart script)

tools/scripts/windows-setup-autostart.ps1 is being deleted by PR #1608 (open since 2026-06-11, conflicting). The deletion rationale, in #1608's own words:

  • Registers a SYSTEM scheduled task that "popped a visible terminal window on every login — the malware playbook signature"
  • Brought up vEthernet (WSL) before NlaSvc/WMI finished enumerating real Wi-Fi → Settings>Network freeze on a fresh HP Omen 5090
  • "Silent SYSTEM-level scheduled tasks … erode trust … even when the code is benign"

This PR's section 6 ADDS to that condemned script. Net effect of merging both: rebase conflict, and either:

Resolution required: either (a) hold #1612 until #1608 is decided and relocate the firewall rule onto whatever replacement install path emerges (per #1608 the replacement is wsl.conf [boot] inside the distro — no Windows-side autostart at all), or (b) explicitly close #1608 first. Don't merge #1612 with the firewall block intact while #1608 is open.

RISK — -Profile Any on the airc inbound rule is too broad for a dev laptop

New-NetFirewallRule-Program $aircExe -Protocol TCP -Profile Any

Any = Public + Private + Domain. A developer who sets up auto-start on a laptop then takes it to a coffee shop is now accepting inbound airc TCP from anyone on the public AP. The intent (grid routing between owned nodes) is Private/Domain only.

Fix: -Profile Domain,Private (or document why Public exposure is intentional).

RISK — CUDA toolkit auto-install fires without explicit user opt-in

install_cuda_toolkit is a 3GB silent download triggered any time nvidia-smi reports a GPU. The header docstring announces the install at top, but for a user who clones to peek at the code and runs bash tools/scripts/install.sh on their NVIDIA laptop expecting a quick "from-source" build, this is a surprise 3GB download with sudo prompts.

Fix: prompt before fetching the keyring (Y/N with default N in interactive mode), or honor a CONTINUUM_SKIP_CUDA=1 env-var bypass and advertise it in the YELLOW "will install CUDA toolkit" line during detect_gpu.

RISK — rust-toolchain.toml pin is workspace-wide and unrelated to PR title

This PR's title is about install-script changes; pinning the toolchain to 1.92.0 across the entire workspace is a substrate-affecting change that deserves its own PR (or at minimum its own commit and a sentence in the PR description). The justification (continuum-core ICEs on >= 1.93 for x86_64-linux) is plausible but unverified by reviewers without an issue link / reproducing build log.

Fix: either split into its own PR with the ICE reproduction log, or add the issue link / CI run that reproduces the 1.93 ICE.

NIT — stale "from-source" path in error messages

Header (line 19) correctly says cd continuum && bash tools/scripts/install.sh. But the in-script error text still points to the old layout:

  • Line 31: Usage: bash scripts/install.sh
  • Line 175: cd src && bash scripts/install.sh (in the CAN_SUDO=false branch of install_cuda_toolkit)

A fresh-Linux user who hits the "no terminal for sudo" path follows that hint to a nonexistent path. Fix to cd continuum && bash tools/scripts/install.sh.

NIT — Git Bash guard prompt could be friendlier

The windows-shell branch says:

✗ This is Git Bash / MSYS on Windows — not a Linux environment.

A first-time user who downloaded the repo and double-clicked Git Bash because that's what their muscle memory does on Windows won't know that "MSYS" means them. The redirect to WSL2 / install.ps1 is good; consider leading with the redirect ("Run install.ps1 instead, or open a WSL prompt") rather than the technical distinction. Minor; not blocking.

CI: carl-install-smoke (linux/amd64) is a pre-existing failure, not caused by this PR

Verbatim failing log line:

🦀 continuum-core: Core IPC types (code, persona, rag, ipc, memory, voice, data)
   ❌ Failed: exit=null signal=null
❌ Some bindings failed to generate
✗ dist/cli-bundle.js was NOT created by build:cli (esbuild silently failed?)

This is the ts-rs binding generation step (Rust→TS) for continuum-core failing with exit=null signal=null (process killed, no exit code captured — likely OOM, missing toolchain, or rustc panic). Same failure signature on every canary push since at least 2026-06-10 — checked runs at canary SHAs c109b069, 46832b47, 2a4b996f, 706b623c, d39241ce, 74f8be29, a0149e4c, 14c3e38c, 0fd01a40, dce4b0d1, 8fbc2606, f022f61a. All carl-install-smoke runs on canary in the past two days have failed with this same ❌ Failed: exit=null signal=null from the same ts-rs step.

Verdict: not caused by this PR; rerun won't help. The actual underlying break (likely related to the substrate-first repo layout migration in 2cb63e019 or a missing rustc on the smoke runner) is a separate fix and the smoke gate is currently red across the board. This PR should not be blocked on it, but the bot result remains red until the canary-wide break is fixed.

Summary

NOT APPROVED pending the BLOCK on the #1608 conflict. The script logic (CUDA detection, Git Bash guard, node/npm PIPESTATUS check, rust-toolchain.toml pin) is broadly sound — but the firewall rule is being added to code that the same author has already moved to delete in a parallel PR for malware-pattern reasons, and the firewall rule itself is over-scoped (-Profile Any). Decide #1608 first, then this becomes a tidy 3-NIT review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant